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SIMULATING MICROARRAYS USING A PARAMETERIZED MODEL 

5 RELATED APPLICATION DATA 

This application claims the benefit of U.S. Provisional Patent Application 
No. 60/350,326 filed January 18, 2002. 

TECHNICAL FIELD 

10 The technical field relates generally to the field of microarrays and more 

specifically to simulating microarrays using a parameterized model. 

BACKGROUND 

Since the inception of cDNA microarray technology [1] as a high throughput 

IS method to gain information about gene functions and characteristics of biological 
samples, many applications of the technology have been reported [2-10]. With the 
improvement of the technology, including fabrication, fluorescent labeling, 
hybridization, and detection, many computer software packages for extracting 
signals arising firom tagged mRNA hybridized to arrayed cDNA locations have been 

20 designed and applied in various experiments [11-13]. As reported in [11], a target 
detection procedure has been implemented that utilizes manually specified target 
arrays, extracts the backgroxmd via the image histogram, predicts target shape by 
mathematical morphology, and then evaluates the intensities fi-om each cDNA 
location and its corresponding ratio quantity. 

25 While most software packages are satisfactory for routine image analysis and 

the extraction of information regarding phenomena with highly expressed genes, the 
desire to discover subtle effects via microarray experiments will ultimately drive 
experiments towards the Umit of the technology [13], with less starting mRNA 
and/or more weakly expressed genes. Weak signals and their interaction with 

30 background fluorescent noise are most problematic. Problems include the nonlinear 
trend in expression scatter plots, fishtailing at lower signal range, low measurement 
quality of expression levels due to uneven local background, and small cDNA- 
deposition areas. These artifacts, or sources of uncertainty, creep into higher-level 
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Statistical data analyses, such as clustering and classification, raising concerns about 
their validity. 

Numerous remedies have been proposed, such as carefully designed 
experiments in which duplications are used to minimize the uncertainty [14]. 
5 However, given the scarcity of certain biological samples, large duplications of 

experiments are often impractical. Consequently, generating cDNA microarrays has 
posed challenges. 

SUMMARY 

10 According to one embodiment of the piresent invention, simulating a 

microarray includes defining a number of parameters. A microarray is generated 
according to the parameters using an imaging procedure. The microarray is 
compared to a known value, and the imaging procedure is evaluated in response to 
the comparison. 

15 According to one embodiment of the present invention, cDNA microarrays 

provide simultaneous expression measurements for thousands of genes that are the 
result of processing images to recover the average signal intensity from a spot 
composed of pixels covering the area upon which the cDNA detector has been put 
down. The accuracy of the signal measurement depends on using an appropriate 

20 procedure to process the images. This includes determining spot locations and 

processing the data in such a way as to take into account spot geometry, background 
noise, and various kiiids of noise that degrade the signal. This document presents a 
stochastic model for microarray images. There can be over twenty model 
parameters, that control the signal intensity, spot geometry, spot drift, backgroxmd 

25 effects, and the many kinds of noise that affect microarray images owing to the 

maimer in which they are formed. The parameters (e.g., each of them) can be 
govemed by a probability distribution. The model can be used to analyze the 
performance of image procedures designed to measure the true signal intensity 
because the ground truth (signal intensity) for each spot is known. The levels of 

30 foregroxmd noise, background noise, and spot distortion can be set, and procedures 
can be evaluated under varying conditions. 
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Certain embodiments of the invention may provide technical advantages. A 
technical advantage of one embodiment is providing a model to evaluate imaging 
procedures and understand process interactions. Modeling and simulation of 
microarray image formation is a key to benchmarking various signal processing 
tools being developed to estimate cDNA signal spots. Using a model to describe the 
signal ground tmth not only helps in evaluating these tools, but also facilitates the 
understanding of various process interactions. 

Other technical advantages are readily apparent to one skilled in the art from 
the following figures, descriptions and claims. Embodiments of the invention may 
include none, some, or all of the technical advantages. 

BRIEF DESCRIPTION OF THE DRAWINGS 

For a more complete \mderstanding of the present invention and for further 
features and advantages, reference is now made to the following description, taken 
in cohjimction with the accompanying drawings, in which: 

FIGURE 1 is a flowchart illustrating an example method for generating a 
microarray; 

FIGURE 2 illustrates backgroxmd noises; 

FIGURE 3 illustrates noise settings; 

FIGURE 4 illustrates an example cDNA microarray spot model; 

FIGURE 5 illustrates variability in spot size and spread from the spot size; 

FIGURE 6 illustrates inter-spot grid spacing; 

FIGURE 7 illustrates an effect of a radius drift variation; 

FIGURE 8 illustrates chord rate settings; 

FIGURE 9 illustrates an edge noise of spots; 

FIGURE 10 illustrates fluorescent detection response characteristic 
frmctions; 

FIGURE 11 illustrates possible scatter plots due to various response 
conversions for different fluorescent channels; 

FIGURE 12 illustrates increased spike noise levels; 
FIGURE 13 illustrates scratch noise and parameter settings; 
FIGURE 14 illustrates parameter settings for snake noise; 
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FIGURE 15 illustrates convolution kernels; 

FIGURE 16 illustrates a three-dimensional profile before and after 
smoothing; 

FIGURE 17 illustrates a full size array simulation with different parameter 
5 settings; 

FIGURE 1 8 illustrates a comparison between a simulated signal versus an 
extracted signal from a microarray image analysis program; and 

FIGURE 19 illustrates simulated images exhibiting undesirable noise 
conditions. 

10 

DETAILED DESCRIPTION 

1. Introduction 

To improve detection and quantification of weak targets, it is important to 
understand the entire process of microarray formation, from fabrication to the 
15 scanning microscope. Use of the knowledge that the average intensity of the 

background fluorescence is normally distributed to help design a backgroimd 
detection procedure is one example of incorporating prior knowledge into detection 
methods [16]. 

A complex electrical-optical-chemical process is involved in cDNA- 
20 microarray teclinology, from fabrication of the cDNA slide, to preparing the RNA, 

to hybridization, to the capture of images created from excitation of the attached 
fluors. This complex process possesses multiple random factors. Images arising 
from it are processed digitally to obtain the gene expression intensities and/or ratios 
that quantify relative expression levels [11]. The efficacy of the analysis to be 
25 carried out on the ratios, be it clustering [3,17-19], classification [5,10], prediction 

[20,21], or some other, depends on the ability of the imaging procedure to extract 
sufficiently accurate and consistent intensity levels from the spots. 

As is common in imaging applications, it is difficult (or perhaps impossible) 
to utilize physical ground truth as a standard by which to evaluate procedure 
30 performance. Hence, it is common to proceed by modeling the imaging process to 

simulate the various aspects of the real image process [22-24]. Image processing 
procedures can be applied to the simulated process to evaluate their performance. 
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One might also concurrently adjust the model parameters to see how changing 
various random components of the formation process impacts upon the final images, 
and therefore the ability to extract meaningful information. For instance, a procedure 
might have biases at low signal intensities or high noise intensities that are not 
present at higher signal intensities or lower noise intensities. Here it should be 
recognized that "ground truth" refers to the true signal intensity, not the actual 
quantity of mRNA in the sample corresponding to the DNA in the spot 

Modeling any but a very simple physical process is a very challenging task. 
A physical process is typically influenced, directly or indirectly, by forces whose 
interrelation is vmknown. The resulting model will be a random process. Each 
realization of the model depends on random variables chosen according to various 
model distributions. A good quantifiable model may be required to approximate the 
physical process and to have realistic variability to describe the randomness of the 
system. 

In Ae present work, microarray image formation cam be modeled by a series 
of random processes influenced by almost two dozens parameters. The modeling 
process is described in terms of the various random variables that determine spot 
size, shape, and intensity, as well as variables that affect the background, including 
noise. Each random variable is associated with a distribution. In some cases, one 
may select the parameters of the distribution (such as mean and variance for a 
normal distribution) to reflect the unages of interest, such as brightness, spot size, 
noise intensity, etc. In other cases, the distribution of a random variable is dependent 
on the outcome of some otiier variable, and it is possible that the parameters 
goveming the distiibution of a random variable may themselves be random 
variables. 

Although various distributions to govern tiie variables in the model are 
postulated, one may wish to use other distiibutions to characterize the signal and 
noise distributions. Moreover, the experimenter is firee to choose the parameters of 
the distiibutions. Microarray technology is evolving rapidly, and there are ahready 
many variations of the technology in use. Hence, model flexibility is mandatory. For 
instance, for a microarray system that does not produce doughnut holes m the spots, 
the variables associated with the hole can be nullified. In the case of a stable system 
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15 



in use without change for a sufficiently long period to produce a large number of 
images, one can apply statistical estimation to determine some model parameters, 
such as those for spot radius. Clearly, these estimates will only be of value to the 
specific system from which they have been derived. Hence, they remain outside the 
simulation package per se. 

The simulation procedure produces spots at a preset grid of locations that 
resemble the actual micioanray. Blocks can correspond to a specific pin of the 
robot hand, and the inter-block variation is modeled in the simulation by allowing 
various model parameters to be randomized by block. At the start of each new block, 
the parameters of the spots can be reset. The intention of the printing process is that 
spots possess regular circular shapes. Due to mechanical fatigue, the adhesion 
process for the DNA solution concentration, and bio-chemical interactions, various 
perturbations are possible in array preparation, printing, and scanning. Various 
features of the model simulate these random perturbations. 

The simulation procedure may be implemented in a computer using any 
suitable configuration of hardware and/or software. As used in this document, the 
term "computer" refers to any suitable device operable to accept input, process the 
input accordmg to predefined rules, and produce output, for example, a personal 
computer, workstation, network computer, wireless data port, wireless telephone, 
20 personal digital assistant, one or more processors within these or other devices, or 
any other suitable processing device. 

2. Simulation of cDNA Microarrays 

The simulation of the cDNA microarray images is designed for two-color 
25 fluorescent systems with a scanning confocal microscope. A block diagram of the 

overall simulation process is given in FIGURE 1, which includes four main 
modules: fluorescent background simulation, simulation of cDNA target spot 
generation, post-processing simulation, and TIFF image output. Each simulation 
module contains many sequential steps (such as spot formation) or alternative steps 
30 (such as different background fluorescence). Each step is discussed according to the 

order in FIGURE 1 in the foUowing subsections. 
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2.1. Background Simulation 

The fluorescent background level is an important part of expression-level 
estimation, since the additive model is routinely used to subtract the local 
background JBrom the signal intensity measurement. It is understood that when the 
signal is sufficiently low, the interaction between the fluorescent background and 
signal affect the estimation process in most image analysis programs, resulting in 
lower measurement quality in the expression ratio. Many factors contribute to the 
observed fluorescent background: auto-fluorescence from the glass surface or the 
surface of the detection instrument, non-specific binding of fluorescent residues 
after hybridization, local contamination from post-hybridization slide handling, etc, 
A perfect system would yield a flat backgroimd possessing a normal distribution, 
while ' a microscope without an auto-focus mechanism may produce a slanted 
background level if the slides are loaded unevenly. Some other extreme 
hybridization condition may cause higher non-specific hybridization to the edge of 
the hybridization chamber, which effectively creates a parabolic surface of 
backgroimd noise. The local contamination is left to the processing module in 
Section 2.4. 

The background derived from surface fluorescence upon laser excitation is 
usually governed by the Poisson process, which can be approximated, by a normal 
distribution when the arrival rate, or the accumulation of photons, is large enough 
[16]. This property can be readily accessed by the histogram of any background 
region of the microarray images. Therefore, background noise is simulated by a 
normal distribution whose parameters are randomly chosen to describe the process: 
h N{\^b9 <yi^}' If multiple arrays are desired, the inter-array difference is modeled 
by a uniform distribution: \ib ~ U(a, b), at is given as a multiple of \Xb: Ob = foM^fr- 
Typically, is about 10% of the mean background level. 

Rather than be constant across the entire microarray, the mean of the 
backgroimd noise may vary owing to various scanning effects. It can take different 
shapes: parabolic, positive slope, or negative slope. In this case a ftmction g(x, y) is 
first generated (parabolic, positive slope, or negative slope) to form a backgroimd 
surface and normal noise is added to it pixel wise. Thus, the background intensity is 
of the form lb N{iib, Ob) with \Xb = yg{x, y)> where y ~ J7(a, b) is the targeted 
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background noise level. Background deviation is set independently for each channel: 
cr^j -ki^^Mb ^^^62 '^h2^b' FIGURE 2 shows various noise backgrounds with 

^^=^^=0.1. 

In many practical examples, the non-specific hybridization at the target 
5 location may be diflFerent from its peripheral region. Although one may have trouble 

pin-pointing this particular observation under nonhal conditions owing to signal 
interference, it is sometimes unmistakable when locations assumed to be weakly 
expressed, or not expressed at all, carry some non-zero readouts, or the intensity in 
the center is stronger than the doughnut-ring if the printed target is doughnut- 

10 shaped. This artifact is simulated under a gradient noise condition by allowing the 

background for the center holes to be at liigher levels than the signal intensities. 
Hence, there is an option to use global background or local background information 
to set the noise parameter for the center hole. FIGURE 3 shows the effects of using 
local and global background parameters. This effect may not appear everywhere in a 

15 simulated image; however, it is often sufficient to require appropriate procedure 

design in the image analysis program to lessen the penalty. The effects of weak 
targets will be further studied in later sections. 

2.2. Spot Simulation 

20 cDNA deposition routinely follows a rigid grid defined by the robotic print 

pattern. The simulation procedure produces spots at preset grid locations that 
resemble the actual microarray. In principle, print-tips are manufactured uniformly; 
however, their microscopic morphologies, and thus their deposition-binding 
behaviors, are noticeably different. Each block corresponds to a specific print-tip of 

25 the robot hand. To take tip variability into accoimt, within each block the spot 

variation is governed by block parameters, which themselves are random variables. 
At the start of each new block, the spot parameters are reset according to these 
random variables. 

The key simulation of this study is devoted to the cDNA targets, which 
30 nominally possess a circular shape. Owing to many factors, the actual shape may be 

higjily non-circular. The model takes various random pertiubations into account: (1) 
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radius variation, (2) spot drifting locally, (3) center core variation, (4) chord 
removal, (5) edge noise, (6) edge enhancement, (7) signal intensity, and (8) signal 
response transform. FIGURE 4 shows a schematic drawing for the cDNA target 
simulation. The variables in the figure are explained in the following eight 
subsections. 

2.2.1* Variation of Radius 

Prior to distortion and noise, the cDNA deposition spot is considered to be 
circular with random radius 5. The mean of the radius is set according to the array 
density and its variance relates to the consistency of spot size. S is modeled by a 
normal distribution having mean lis and variance a,^, S N(\is, a,), with the standard 
deviation being a pre-determined proportion, A^, of the mean, or 5 N(}Xs, fejis). The 
radius mean is set for every block, and randomized over a small range within the 
array. The block randonmess of [is is modeled by a imiform distribution, Uisa, 
FIGURE 5 shows parts of blocks with spot radii depending on the number of 
spots in a block. For FIGURES 5a through 5c, the block portions are for block sizes 
(10, 15), (25, 45), and (25, 45), respectively, where (col, row) denotes the number of 
columns and rows within the block, respectively. Occasionally, a spot overlaps with 
it neighbors (FIGURE 5c) when kr is set to a larger proportion. This situation 
simulates the condition where too much cDNA solution is deposited and/or the 
drying process may be slow in comparison to the liquid spreading process. 

Depending on the robot arm and printing abiUty of the pins, the inter-spot 
distance, Gsp, may vary. Owing to the physical mechanics of the robot aim, the 
block size (pixel imits) is fixed in most cases. The inter-spot distance can be set to 
accommodate spot size and random variation in spot radii. The effects are illustrated 
in FIGURE 6, where the number of rows and colxmms are fixed. 

2.2.2. Spot Drift 

During the fabrication stage, the deposition of cDNA targets may not follow 
the pre-defined grid, owing to print-tip rotation, vibration, or other mechanical 
causes. Other drifts are attributed to the slide's coating properties and the drying 
rates of the cDNA. This displacement is modeled by possible random translations in 
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the horizontal and vertical directions. Each spot has an equal probabiUty, P/>, of 
drifting. If a spot is selected for drift, then the amounts of drift in both directions are 
random multiples of the current spot radius. The horizontal and vertical multiples, 8;^ 
and by, called the "drift levels," are uniformly distributed: 5;,, 6^;, U(da, dt). The 
horizontal and vertical drifts are Dx = bxS and Dy = 5yS, respectively. Inter-spot 
distance can be set according to the drift to minimize the impact of overlapping 
spots. 

Some microarray scanners capture two fluorescent signals in two passes of 
scanning. Due to the mechanical homing error, the two fluorescent channels may not 
align exactly. In these settings, some small offset between the two channels can be 
observed. This offset may occur at sub-pixel resolution. To simulate this offset, the 
model offers a random offset between the centers of the two chaimels. It is achieved 
by randomly offsetting the spot center of the second channel by one pixel in either 
of the horizontal and vertical directions. These offsets are apphed following 
application of the spot drifts in the first channel. FIGURE 7 illustrates the spot drift. 

It is essential for the image analysis procedvire to detemiine the exact 
location of the target spot so that an accurate measurement can be carried out 
without the interference of the dusty noise aroimd the targets. Some procedures rely 
on the assumption that the printing grid is rigid with the cDNA target in the center, 
others assume an imperfect printing process such that a deformable grid may be 
needed. The former method is faster and noise insensitive, but may be inaccurate if 
the slides are fabricated with many displacements; the latter is robust in target 
position detection, but can be rather slow and noise sensitive. In either case, the 
simulation outcome will provide a set of evaluation images to assess the tolerance of 
both procedure designs. The slightly misaligned channels also pose a challenge to 
signal intensity extraction. 

2.2.3. Doughnut Hole 

Owing to the impact of the print-tip on tlie glass surface, or possibly due to 
the effect of surface tension during the drying process, a significantly lesser amount 
of cDNA can be deposited in, or attached to, the center of the targets. Consequently, 
the center of the target emits less fluorescent photons, thereby giving a target the 
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doughnut shape. It is critical for signal intensity extraction whether or not the center 
hole is assumed, particularly when tiie signal is weak and there is a large center hole. 
The simulation allows one hole in the center with varying size, along with a possible 
off-center displacement. It is not necessary to simulate more than one hole, since the 
mathematical properties for signal and noise estimation are preserved with this 
simple condition. 

An elliptical shape models tlie imier core with random horizontal and vertical 
axes, H and V. The axes are modeled by a normal distribution whose parameters are 
randomized for each block within a given array: H ~ N{\i.H, cth) and V ~ N(\xv, ctk). 
Inter-array variabiUty in these radius distributions is modeled by uniformly 
distributed means: \im ~ U(aH» bu), ojH= o.i\ih and \iv ~ Uiay, by), Cv = aaUK, where 
the controlling ratios vary over a range, ai, a2 ~ UiPa. Pb)- The choice of the 
parameters govems the hole shapes. The center position of a hole is allowed to drift 
over a range. The shape is unaffected by the drift because the mechanical print-tip to 
surface contact is unaffected. The amount of drift in tlie horizontal and vertical 
directions is modeled similariy to spot drift. Drift levels are set at every block, (5cx«. 
bcyx) and {bCjcG. ^Cyo), for both channels. The amount of drift is first selected from a 
uniform range, 8c ~ U[i, j]. Channel and inter-channel drifts are modeled by a 
uniform variate and set for each block: 5c;,g= 8cxL/[-1, 1], Bcyc = 5cxl/[-l, 1], Scxr 
=5c,c? + U[-h 1]> and dcyR = Scyo + C/[-l, !]• 

2.2.4. Chord Removal 

Since parts of a spot can be washed off due to various physical effects during 
the hybridization and washing stages, pieces of a spot may be missing. This 
condition is simulated for the same reasons that the center hole is simulated. This 
irregularity is modeled by randomly cutting chords from the circular spots. The 
number of chords to be removed, Nc, for a spot is selected from a discrete 
distribution, {0, 1, 2, 3, 4}, where the elements of the distribution occur with 
probabilities po,PuP2>P3, and pa, respectively. For images with very few pieces cut 
off, the zero-chord probability po is very high, and the three- and four-chord 
probabiUties are close to 0 (possibly equal to 0). To model inter-array variability, the 
probabilities can be treated randomly. 
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Once the number of chords for a spot is determined, the distance, Z, of each 
chord center to the edge is selected from a beta distribution: L ^B(aL, fii)- Inter- 
block variability is modeled by allowing a/, and Pz, to be randomly selected from 
uniform distributions: - U{a^, ba), and ~ t/(ap, Owing to the large family 
of shapes generated by beta distributions, this provides a wide range of distributions 
for L. Finally, the chord locations are chosen unifomily randomly according to an 
angle 0 ~ U(0, In). FIGURE 8 illustrates the effect of selecting increased chord 
rates: (a)po = OJO,pi = 0.30; (b) po = 0.20, /?i = 0.40, 772 = 0.25, J73 = 0.15; (c) po = 
0,pi = 0.10, /72 = 0.40, = 0.30, J74 = 0.20. 

2.2.5. Edge Noise 

Owing to the manner in which liquid dries, the spots usually do not have 
smooth edges. To provide a realistic visual effect, as well as to pose a challenge if 
edge detection procedures are under consideration, tliis irregular edge effect is 
simulated via parameterized noise using a binary edge-noise procedure employed in 
digital document processing [25]. After determining the target shape by cutting the 
center hole, removing possible chords, and possibly creating drift, and prior to 
simulating the signal intensity, the spot is still in its binary format, and thus the 
binary edge-noise procedure can be applied directly. Edge noise is applied to both 
the outer perimeter of the spot and the inner perimeter containing the hole. 

The procedure begins by first generating a white noise image having range 
[0, max intensity]. A 3 x 3 averaging filter is applied to the white-noise image to 
arrive at a noise image JST that possesses a degree of correlation resembling the noise 
characteristics of various physical processes, including printing processes. The edge 
of a binary image can be considered to consist of two parts, and inner and outer 
borders. In our case, the spot radius is known and so are these borders. The inner 
border is fomied by morphologically eroding the image by a 3 x 3 structuring 
element and then subtracting the erosion from the original image. The outer border 
is formed by morphologically dilating the image by a 3 x 3 structuring element and 
then subtracting the original image from the dilation. To apply noise to the inner 
border, a threshold, mid + 5, just above midpoint is applied to JV, this binary image is 
ANDed with the imier border of tiie original binary spot 5, and the result is XORed 
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with S. Noise is applied to the outer border by thresholding N just below the 
midpoint (mid - 6), complementing, and then ANDing with the outer border of S. 
This noisy outer border is then ORed with the image possessing inner border noise 
to yield the edge-degraded binary spot S' . The process is mathematically described 
by 

where 6 controls the threshold and hence the edge noise, and A denotes the 
symmetric difference. 8 is used as controlling parameter. S' is a binary mask giving 
the spatial domain of the spot. FIGURE 9 shows edge noise for various 6 tluresholds. 

2.2.6. Signal Intensity 

Simulation of signal intatisity is divided into three steps. First, it is assumed 
that the fluor-tagged mRNAs co-hybridized to a single slide are from the same cell 
type, and therefore the signals from the two fluorescent channels are su^josed to be 
identical, with some variation. Second, some percentage of genes may be selected as 
significantly over- or under-expressed. Third, foreground noise is added to the entire 
array to simulate the normal scanning integration process. 

It is well known that the distribution of gene expression levels within a cell 
closely follows an exponential distribution [26]. Given a microarray containing N 
genes, the intensity levels 4 , for k = 1, iV, assumed to be related to the 
expression levels of N genes, are simulated by an exponential distribution. This 
intensity level h is considered to be the ground-truth signal that is not directly 
measurable from microarray, since from either biological or bio-chemical processes, 
from mRNA extraction up to the hybridization process, some variation will be 
introduced into measurement of final fluorescent signal strength. For each 
microarray, a particular exponential distribution with mean p is first chosen (for a 
detection system with gray-level up to 65,535, p is usually selected around 3000). 
Then at each spot location, which is assumed to represent one unique gene, one 
ground-truth signal level Ik is generated from the exponential distribution. For two 
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observable measurements (jRa, Git) from two fluorescent channels, two numbers are 
generated from a normal distribution with mean of h and standard deviation ofalk^ 
where a is a pre-determined coefficient of variation, which is usually about 5% to 
30% depending on the assumed biological relation between the two channels. 

To include outlier expression levels that reflect certain realistic conditions 
[3-10,14], one may select 5% to 10% of the spots to be either over- or under- 
expressed. This condition is achieved by selecting the genes fi*om the entire 
microarray based on a probability, poiuUer (e.g., PoutUer = 0.05 for 5% outliers), and 
flien selecting the targeted expression ratio for the Afli gene, 

4=10"=^^ (2) 

where bk satisfies a beta distribution, bk ^ B(1.7, 4.8), and where the H-/— sign is 
selected with equal probability. Upon obtaining a targeted expression ratio, the 
procedure converts the expression intensities from the two fluorescence channels by 

(3) 

where R^, and G^. denote the signal values after the conversion. 

Upon obtaining the signal intensities for each spot, (i? a-, Gy, each pixel 
within the spot binary mask derived from steps 2.2.1 to 2.2.5 is filled with the signal 
intensity. Normally distributed foreground noise is then added pixel-v^se. This 
yields, at each pixel, the intensities SR =^Rk-^Ifl and SG = Gjt+ i/2, where If\ - N(jdRf^ , 
a;?/), 1/2 - N(jiCk, ^g/) and jur^ J?;xj7[/^^,/^J, or^ ~ MRk^Ulfc^.f^^], hg^ - 
G\ X C/[/„^ j/aj ] 5 and oq^ - Hgi^ t^E/cj »/^2 ] • ^ remainder of the document, a's 
are used as appropriate: ~ W.^./^J^ ^^Ua^^fhX ~ Wc.^A]. and 
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2.2.7. Channel Conditioning 

Owing to various reasons, such as imprecise quantities of starting mRNA for 
the two channels, different labeling efficiencies, or uneven laser powers at the 
scanning stage, in actual microarray experiments there may not be «qual intensities 
even if two channels use exactly the same labeled nxRNA, Moreover, one may not 
be able to assume that the fluorescent intensity is linearly related to the expression 
level. In fact, it is very difficult to determine the exact fomi of the response function 
from expression level to intensity due to the complex combination of bio-chemistry 
to photon-electronics. A family of functions that covers most of the understandable 
conditions, shown in FIGURE 10, such as delayed response, satiuration (which is an 
embedded feature in the digital system since in general no gray-level can pass 16-bit 
binary digits in a typical microarray system), and imbalanced channel intensity, is 
selected. This simulation is intended to facilitate understanding as to what is the best 
way for expression ratio normalization, whether linear based methods will be 
sufficient or non-linear based methods may be needed. The function family is 
characterized by four parameters, (oo, a\, ^2, <33)> aud the function form is given by 



Having chosen a function from the family, the expression levels, R' and G', from 
each fluorescent detection channel are then transformed by the detection system 
response characteristic fimction defined by y^(x) or fc^x) to obtain the realistic 
fluorescent intensity observed. The observed fluorescent intensities are 




(4) 



K=MK) 



(5) 



Gk=fGiG'k) 



where fjt or fs may take different parameters for each fluor-tagging system. The 
simulation performs the following steps for signal placement to emulate the real 
process affecting the signal spots: 
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1. Generate ground truth expression signal /jt = 1, for every gene by 
exponential distribution. (See Section 2.2.6.) 

2. Let Rk = Gk = Ik- If a self-self experiment is to be simulated, skip steps 3 and 4. 

3. If an experiment with two different samples is simulated, some outlier genes are 
selected and then their intensities are altered. C) is obtained from (/?, G) for the 
genes (e.g., all the genes or some subset thereof). (See Section 2.2.6, and Equations 
2, 3.) 

4. If a fluorescent system with imperfect response characteristics is simulated, the 
intensities are further converted by R" =^MR') and G" = fdG'). (See Section 2.2.7.) 

5. The actual simulated fluorescent intensities for both channels are obtained by 
applying additional variation via a Normal distribution function SR = if" + N(^/{ 
j<yR)j where \Ir= ol„jR", o-/=CLsi]iRy and similarly for signal G. (See Section 2.2.6) 

The scatter plots in FIGURE 11 show the effects of the channel normalization. By 
choosing different parameter sets, one can simulate many of the situations observed 
in real microarray images. 

2.2.8. Edge Enhancement 

Under some fabrication conditions, such as incorrect humidity control, where 
the cDNA solution tends to accumulate towards the outer edge during the drying 
process, the spot edge may appear brighter than the rest of the spot. This 
phenomenon is modeled by randomly enhancing the edge. The number, iVe, of pixels 
from the edge to be enhanced is fixed. The enhancement. Wed, is added to the 
original intensity. Wed satisfies a normal distribution. Wed ^ !)• Randomness 

between blocks is modeled by making ]ie uniformly distributed, |ie ^ U(lay lb)- 

23. Post-processing Simulation 

Most post-processing steps simulate handling and scaiming artifacts: scratch 
noise resulting from improper handling of microarray slides, spike noise arising 
from the impurity of niRNA extraction steps or perhaps insufficient washing 
conditions, snake noise due to the accumulation of dust if the slides have sat in open 
space too long, and last, but not least, smoothing resulting from many scanners' 
averaging effects or integration processes. For the most part, these steps model the 
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interaction between signal and noise in the spatial domain, wliich causes pixel-wise 
non-linear degradation. It is expected that the microarray image analysis software 
shall be able to handle most of the noise conditions outlined here in order to measiure 
the signal precisely. 

5 

2.3.1 Spike Noise 

In a practical biology laboratory, it is not necessary to maintain a dust-free 
environment. Hence, fine microscopic dust particles are nearly impossible to avoid. 
On laser excitation, these particles fluoresce to give high intensity spikes. Moreover, 

10 in some cases, bad mixtures of cDNA solutions result in precipitation, and these 

particles fluoresce with a very high intensity. These effects are simulated by adding 
spike noise at a preset rate. Such intensity spikes are added randomly across the 
entire slide area, the number of such noise pixels being preset in terms of the total 
number of pixels in the array. The amount of spike noise in an array is set with 

15 reference to the percentage, of the total number of pixels in the array. Typical 

low to high noise levels are be set by selecting 0.1% to 10%. Once a pixel is selected 
for spike noise, the adjacent pixels have a higher probability of being affected. Thus, 
a random number, Wsph of pixels are chosen in an arbitrary direction to be 
influenced by this noise. The intensity, of the spike noise is governed by an 

20 exponential distribution with mean iXspi- In FIGURE 12, the exponential mean is 

fixed but the spike rate is increased through the parts of the figure. 

2.3.2 Scratch Noise 

Physical handling of the array slides can result in surface scratches. These 
25 typically result in low intensity levels. Scratch-noise intensity is parameterized as a 

ratio, Ksc, giving the background-to-scratch-noise intensity level. Other parameters 
are the number of strips, strip thickness Wsc, and a random strip length, Lsc, given as 
a multiple of the spot size. The latter is modeled as a uniform distribution: Lsc 
lA^scif Lsci]' Strips are placed at random positions on the array, and are inclined 
30 according to a (discrete) uniformly random angle, 9^^ e {0°, 45^ 90^ 135°, 180"*}. 

FIGURE 13 shows the noise for incremental parameter settings: (a) Lsc ~ f^2, 7], k^c 
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= 2.0, Wsc = 4 pixels; (b) Lsc - U[5, 10], Ksc = 3.0, Wsc = 7 pixels; (c) - t/[7, 15], 
Kjc = 4.0, FFic =10 pixels. The number of strips is fixed at 7. 



2.3.3 Snake Noise 

5 Fine fabric dust particles on the slides can create snake-tailed strips on laser 

excitation. These strips are nonnally higher intensity than the signal level. To 
simulate this noise, an equiprobable multi-directional snake noise has been 
generated consisting of some number, Nseg^ of segments. Analogously to scratch 
noise, the intensity parameterized as a ratio, k^,,, giving the average-signal-to-snake- 

10 noise intensity level, the number of snakes, snake thickness Wsn, and a random 

length, Lsn, given as a multiple of the spot size. The latter is modeled as a uniform 
distribution: Lsn - V[Lsnh Lsni]- FIGURE 14 shows the noise for incremental 
parameter settings: (a) Nseg = 5, Lsn - U[5, 10], k^„ = 0.50, Wsn = 2 pixels; (by Nseg = 
10, Lsu ~ U[5. 30], K,;, = 0.33, Ws„ = 3 pixels; (c) Nseg = 15, Lsn - U[15, SO], k„, = 

15 0.25, fF,,, = 5 pixels. 



2.3.4 Smoothing Function 

Addition of various noise types makes the microarray highly peaked with 
high pixel differences. This stark irregularity can be mitigated by smoothing the 
20 image with either a flat or pyramidal convolution kernel. The kernels are shown in 

FIGURE 15. The effect of smoothing is illustrated in FIGURE 16, where the 3D 
profile of an originally noised image is shown, along with versions smoothed by flat 
and pyramidal kernels. Either smoothing kernel can be chosen. 

25 2.4. Image Generation and Parameter I/O 

Parameters governing the effects described in the preceding sections form 
the input (through a file) to the synthetic array software. These include parameters 
for array dimensions, shape parameters, and noise processes. Relevant information, 
such as spot size, position, various drifts (center hole, spot), noise processes, 
30 (foreground, spike, snake, scratch, etc), and chord rate, are recorded for every spot 

printed on the synthetic array. Block controlling parameters and the array 
information are also recorded. The recorded information contains the true signal for 
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the synthetic microarray. This can be used subsequently to analyze various signal 
processing tools. 

TIFF (tagged image file format) format is widely used due to platform 
independence and flexibility of data representation. The synthetic images are 
generated in TIFF with sample (pixel) resolution of two bytes for every color (R, G). 
Both monochrome and color images with interlaced channel information (if, G, with 
dummy B) are generated. Standard freeware routines fhttprZ/wv^Jibtifforg ) are 
used to generate these formats. The image file is written in blocks, where the size of 
the block (commonly called "strip'') is set equal to the image width. The image data 
is written in the native order {big-endian, little-endian) of the host CPU on which 
the library is compiled. Image data quality is maintained by disabling compression 
and other special options available in these routines and formats, 

2.5. Summary of Model Parameters 

The cDNA microarray printing process can be categorized and grouped into 
independent events. Each event is probabilistically described by assigning a 
distribution, as previously described. Due to the physical nature of the process, there 
exist variations between events. This variation is described by randomization of the 
controlling parameters (second level randomization). The parameter randomization 
can be broadly grouped as (i) randomization at spot level, (ii) randomization at block 
level, (iii) randomization at array level. The parameters are grouped and 
mathematically described in TABLE 1. 
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TABLE 1. Parameter settings for the cDNA microarray simulation 



Level Simulation Parameter Descriptions 



Distribution 



SPOT Spot Size 



S : Spot radius with (/is, os ') 



Spot Drift 4, Sy : Drifting level 

^a, ^fr ' Percentage of spot radius 

Pd • Drift activation probability 
Djc Dy : Relative drifting 

(X'l, y\) : Drifted center coordinates 

0^ 2, y'l) : Second channel, 

where (X, Y) is predefined spot center 

coordinates 



& Sy^U(d^db) 



D:,=^5^xSxU[-l,l] 
Dy^SyxSxU['2, 1] 



IimerHole H, V : Horizontal and vertical axis of H -^^ N(/iff, Of{} 
Size the inner elliptical hole V'^-NfjUy^ aicf 
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Inner Hole A'o Yc : Ideal spot center Xji = Ac+&v/? 

Drift A'/j, Yji : First channel coordinates Yf^ = Kc+^Kji 

ATg : Second channel coordinates 

&xc,&yc A.x/f ,&r/?: drift level set at = 
the block level 



Chord 
Removal 


Puf^ : Chord removal probability 

{pki probability of k chords to 
be removed firom a target spot} 


= {P(h Ph P2, P3f Pd> y^'here 
Po-^Pj-^Pj-^Ps •^P4'^ 1 




L : Chord length 
6 : Chord position 


L -'B(ao Pi) 
e-'U(O,270 


Spot Intensity 


J3: Mean intensity for the assumed 

cell systenoL 
Rh Gk : spot (fixed) signal 

intensities for both channels 


h-^ Exp(P) 




a: CoefQcient of variation of signal 
intensity in the system 




Outlier's 
Intensity 


Poutvter ' Outlier activation probability 

bki Outlier control level 
tk : Targeted outlier expression ratio, 
with equal-probability of +/- sign. 


Equal probability at 0.05 to 0.10 
bk-- Beta (1,7, 4.8) 




R'h G'k : outlier signal intensities 
for both channels 


Rk-^Rk^^ 



Channel R"k» G"k Pre-normalized signal 

Conditioning intensity of the spots on 

red, green channels 
^Oi^u and ^3, parameters for 

response characteristic 

ftmction. 



< 



where 03 > J 



Spot Signal SR^, SGk : Pixel-wise {x,y) signal 

Variation — intensity 

foreground 

noise 

Oyi Within spot signal coefficient of 
variation. 
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Edge 

Enhancement 



Edge Noise 



Wed ' Level of enhancement, parameter 

(jie) set for the block 
Ne : Number of pixels enhanced 



W,,-'N(Me.l) 



Apply edge noise at the set level 
(4/) 



BLOCK Radius 

Parameters 



Inner Hole 
Parameters 



Drift 

Parameters 



jiis* K'* mean and radius deviation 
factor 

Sq, St', bounds of radius, set by block 
size and inter spot gap 



Chord Nc : Chord rate picked with equal 

Parameters probabihty 

Pl ' Chord distributional 
parameters 



Nc ^ U{0, J, 2, 3, 4} havmg weights 

{P0rPhP2.P3,P4} 

at - U(aa, bah fit -ll(a^ 



Mh»Mv> CTHf o-y - Parameters for inner 

elliptical hole 
///e : Mean spot radius in the block 



aj ~ U(Pa. PQ, aj - U(Pa, PiJ 



&xC^ ^yG, ^xR»^yR • drift IcVCl 

ij : Percentage of the spot radius 



Sc-U[iJ] 

dC:,G^ScxU[-lJ], Scyc^ScxU[-lJ] 



Enhancement /„, : Range of intensity ratio. Set ~ U(lay k) 

mean level of enhancement for a block 



ARRAY Physical 

Dimensions 



Signal to 
Noise Ratio 



hiterspot 
Distance 



Bw 3h: Block Size - width, height 
(distance between first spot 
centers of any two block) 

Mi, Mr, Mt, Mb : Margin Settings (left, 
right, top, bottom) 

Npin ,Nrow : Number of Pins in an 

array, printed equally across 
Nrow number of rows 

NSw , NSh : Number of Spots along 
the width (NSw) and height 
(NSh) of the block 



Typical Setting for a 8 blocks, 2 row 
array (in pixels): 

Bh ,Bw - 900 

M,, M,,Mt,Mb=100 



SNR : Signal to noise level is set for 
an array 



Gsp : Interspot distance, set for an 
_ array 
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Background 



Spike Noise 



Edge Noise 
Snake Noise 



h_ciii. hjdtj ' Background intensity, 
with parameters set for an 
array. 



Background level 
Parameter settings : 

-Flat fluorescent background 

- Functional background g(x,y) : 

choice of parabohc, positive or 
negative slant surface function 



Lspi : Level of spike noise (set in 
terms of percentage of total pixels) 
Ns : Intensity of the Spike noise 
: Noise rate 



r-Ufa.bJ 





:Widdi of tiie noise cluster 






: Set the controlling parameter 


Sed set as a percentage of maximum 






intensity value 




: Number of snake tails in an 


^iesc > Kin » ^sn t ^sn 




image 




/«: 


Intensity of the noise tail 






Average signal-to-snake-noise 






intensity level 




Length of the segment expressed 






as multiples of average spot 




size 






: Width of the snake noise tail 




Kc: 


Number of scratch tails in an 






image 


Isc : 


Intensity of the scratch noise 


Lc-N(\i.,c, CTsc) 


: 


Average background-to-scratch- 






noise intensity level 


Lsc: 


Length of the segment in units of 






average size of the spots 




: Width of the scratch noise 


^€U{0, 45, 90, 135,180}^ 


0 : 


Scratch noise inclination 



Each noise type is categorized into one of the three groups and individually 
parameterized. Some are related to another noise parameter; others are independent. 
Each noise parameter is assigned a statistical distribution fitting its nature. For 
5 instance, consider spot radius. Spot Level obeys a normal distribution (fij, 

where the mean spot radius (m) is randomly picked over a small range {say st) at the 
Block Level. This spot size range is set for an array depending on a user setting: the 
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number of spots in a block (NSy„ NS,,) at the Array Level. If a noise type needs to be 
suppressed, then the coiresponding parameters can be set small to nullify its effect. 
For example. Inner Spot Hole follows a normal distribution along its vertical {\l„. 
Oh) and horizontal (^k, ok) axes. Its parameters are randomly picked from a preset 
range and related to the mean spot radius (\ir) at the Block Level [p./^ ~ U(L„, 

Lb) X Kij? . ~ U{L„. Lb) X \ir]. For small or negligible doughnut holes, this preset 
range can be set small, or even null for perfect spots. The table is perused from spot 
level to the array level, tagging through the corresponding parameters, as indicated 
in the above examples. 

3. Examples of Simulated Microarrays and Image Analysis 

The described process and noise eJBfects are controlled by appropriate 
parameter selection. Depending on the parameter setting, the arrays can be roughly 
classified as ideal, average, or noisy. Given a good printing run (substantially no 
mechanical deposition problems), a relative matured hybridization protocol, and 
good RNA samples, along with a scanner of minimal optical warping, focusing and 
integration problems, a high-quality (ideal) microarray image is expected. The 
corresponding simulated ideal image will have a.flat mean background with typical 
auto-fluorescence variation (< 10% of mean background level, but no less than 
square-root of the mean background level), minimum spike/scratch/snake noise, 
little edge enhancement and substantially no channel conditioning problems. For 
average image quality, one would expect larger background variation and possibly a 
slanted mean level. There will also be more spike/scratch/snake noise interfering 
with signal spots. In a noisy setting, besides higiher noise levels for various possible 
interference, one would also expect uneven background level {e.g., parabolic 
fimction), heavy spot defonnity (chord cuts, edge enhancement, and large inner 
holes), and different channel conditioning (such as the banana shape in the intensity 
scatter plot shown in FIGURE lib). 

FIGURE 1 7 shows two microarrays generated with NS^ = 35 rows and NS,, = 
25 columns, at B,, = B,^ = 900 pixels per block. Array boundaries are set at {M,, Mi, 
M,, Mz,) = (100, 100, 100, 100). By choosing parameters, two different array 
qualities have been generated. Part a illustrates an ideal microarray image with 



wo 03/097850 



PCT/US03/01784 



25 

normal background and parameters (3 = 3000, SNR = 2.0, a = 0.05, G,p= 6, Pd - 
0.05, {da, dt) = (2, 15), iki^ ,kjj2) = (lO; 10), Po„mer= 0.05, L^pi = 0.3%, Serf =0.3, 

(/a,>A,/c../rf,)=(2,8.2,6) 
5 ifa^Jh.Jc-,Jd^^ = (2,8,2.8) 

(flo, ai.a2.a3) = (0, 1,-1, 1) 

(^'o,fel.^'2.^^) = (0,1,-1,1) 

(/«,4,iVe) = (l,3, 3) 
(Po,j9i./?2.P3) = (0.97, 0.03, 0, 0, 0) 
10 {KsN, LsNx, LsNuWsN, Nsn) = (0.25, 10, 50, 1,2 ) 

(Ksc> Lsci, Lscz, Wsc. Nsc) = (3. 5, 35, 3, 1) 

Part (b) illustrates a noisy microarray image with parabolic background and 
parameters: p = 3000, SNR = 1.1, a = 0.25, Gsp= 4, Pd = 0.4, (da, di,) = (15, 100), 
15 (*6, ,%2) = (25,2 5), Po...iier= 0.7, Lspi = 15%, 6«, = 0.03, 

(/a, ./c, ./rf, )= (6, 12, 8. 20), 
(/a2 ,fb2 'fc2'fd2 ) = (6, 12, 8, 20) 
(ao, ai. 02, 03) = (0, 500, -1, 1) 
20 (^0. ^^i. 62, ^^3) = (0,10,-1,1) 

(la, lb.Ne) = {10, 40, 3) 
(Po,P\.P2.P3) = (0.05, 0.3, 0.25, 0.25, 0.15) 
{KsN, Lsm, Lsm,WsN. Nsn) = (0.25, 60, 1 10, 2, 10) 
{Ksc, Lsa,Lsa, Wsc, Nsc) =<0.25, 60, 1 10, 2,10) 

25 

To illustrate how the simulation can be used to analyze microarray image 
software, the ArraySuite [11] software is applied to exfract the image intensities and 
ratios from the image and then these are compared to the coiresponding int^sities 
and ratios used for simulation. The ideal case is used to illustrate the utility of the 
30 simulation. Li FIGURE 18a, intensities from one fluorescent channel have been 

extracted (y-axis) and plotted against the simulation signal intensities. The extracted 
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signal generally corresponds well to the simulated signal, with some variation. After 
excluding intensities less than 300, the mean and standard deviation of the 
difference between the two log,o-transfonned intensities are 0.016 (or lo"**'* = 
1.038) and 0.038 (or 10° °^« = 1.09), respectively. The ratio comparison is given in 
FIGURE 18b. When signal intensity is weak (less than 300), various noise 
components in the simulation process affect flie accuracy of the signal exfraction 
program. Since the problem may be unavoidable, a measurement quaUty metric may 
be used to provide confidence in downstream data analysis, in this case, if the signal 
intensity is less than 300, then the noise interaction is significant. 



4. Conclusion 

Modeling and simulation of microarray image formation is a key to 
benchmarking various signal processing tools being developed to estimate cDNA 
signal spots. Using a model to describe the signal ground truth not only helps in 
evaluating these tools, but also facilitates the understanding of various process 
interactions. To illustrate how the image-simulation program presented in this 
document can be used in the development of image-analysis software, an actual case 
is described. 

The simulation program has been used extensively in the design of the 
microarray image-analysis program used at the National Human Genome Research 
Institute. This has been done by testing the accuracy of the analysis program on 
simulated images exhibiting troublesome noise conditions and then tuning the 
program to achieve better resuhs. One such application concerns large and 
overlapping spots, as illustrated in FIGURE 19(a), which shows part of an actual 
hybridized image in which some spots are substantially larger than intended owing 
to randomness in the cDNA deposition procedure. This defect causes various 
problems, one being poor background estimation. This problem is illustrated by 
simulating an image with large spot size variation and drifting conditions [FIGURE 
19(b)]. If the image analysis program extracts the local background by averaging the 
region around the bounding box (which was used as a starting condition in an eariier 
version of the NHGRI program), an elevated background average may be obtained 
smce the bounding box may overlap neighboring targets that are large in size and 
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strong in expression level. An additional problem is that some weak targets may not 
be detected [FIGURE 19(c)]. Based on these considerations, the program has been 
modified to calculate the four average intensities fi-om the four comers and the four 
average intensities from the four sides of the bounding box, and then take the 
5 minimum among these as the initial estimation of the local background. A 

histogram-based method is tiien invoked aroimd the initial estimated background to 
further improve the estimation [11]. The output from FIGURE 19(b) according to 
the modified program is shown in FIGURE 19(d): the weak target is detected and 
there is improved local badcground estimation for the spots. 

10 

5. Alternatives 

The various steps described herein can be achieved via execution of 
computer-executable instructions. Such instmctions can be stored in computer- 
readable media (e.g., RAM, ROM, hard disk, CD-ROM, DVD-ROM, or the like). 
15 Although exemplary embodiments of the invention and their advantages are 

described herein in detail, various alterations, additions, and omissions can be made 
without departing from the spirit and scope of the fwesent invention as defined by 
the appended claims. 
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WHAT IS CLAIMED IS : 

1. A method of generating a simulated microarray image, the method 
comprising: 

receiving a plurality of simulation parameters; and 

generating the simulated microarray image based at least on the simulation 
parameters. 

2. A computer-readable medium comprising computer-executable 
instructions for performing the method of claim 1. 

3. A method comprising: 

generating a simulated microarray image based on simulation parameters, 
wherein the simulated microarray image is associated with known values; and 

analyzing the simulated microaixay image via a microarray imaging 
procedure, the analyzing comprising calculating observed values. 

4. A computer-readable mediimi having computer-executable 
instructions for performing the method of claim 3. 

5. The method of claim 3 further comprising: 

comparing the known values with the observed values to benchmark tiie 
microarray imaging procedure. 

6. The method of claim 7 further comprising: 

generating a rating based on results of the comparing, wherein the rating 
indicates effectiveness of the microarray imaging procedure. 

7. The method of claim 3 wherein the values comprise spot intensity 

values. 

8. The method of claim 3 wherein the generating comprises simulating a 
fluorescent background level for the simulated microarray image. 
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9. The method of claim 3 wherem the generating comprises simulating 
spots for the simulated microarray image. 

10. The method of claim 3 wherein the generating comprises simulating 
post-processing phenomena for the simulated microarray image. 

11. A method for simulating a microarray, comprising: 
defining a plurality of parameters; 

generating a microarray according to the parameters using an imaging 
procedure; 

comparing the microarray to a known value; and 

evaluating the imaging procedure in response to the comparison. 

12. computer-readable medium having computer-executable 
instructions for performing the method of claim 1 1 . 
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AMENDED CLAIMS 

[received by the International Bureau on 07 October 2003 <07.10.2003); 
original claims 1 amended; new claims 3-23 added (5 pages)] 

WHAT IS CLAIMED TS : 

1 . A computer-in^lemCTted method of generating a simulated 
microairay image, the method comprising: 

receiving a plurality of simulation parameters; and 

generating the simulated microarray image based at least on the simulation 
parameters. 



2. A computer-readable medium comprising computer-executable 
instructions for performing the metiiod of claim 1 . 

3. The computer-implemented method of claim 1 wherein the simulated 
nricioairay image is associated with known values, the method further comprising: 

analyzing the simulated microarray image via a microarray imaging 
procedure, the analyzing comprising calculating observed values; and 

comparing the known values with the observed values to benchmaik the 
microarray imaging procedure. 

4. The computer-implemented method of claim 3 wherein: 
the known values comprise signal intensities; 

the observed values comprise signal intensities; and 
the comparing compares the signal intensities of the known vahies with the 
signal intensities of the observed values. 

5. The computer-implemented method of claim 1 wherein the simulated 
microarray image simulates random perturbations in array preparation, printing, and 
scanning. 

6. The computer-implemented method of claim 1 wherein the simulated 
microarray image simulates background noise. 

7. The computer-implemented method of claim 1 wherein the simulated 
microairay image simulates radius variation of cDNA dqwsition spots. 

AMENDED SHEET (ARTICLE 19) 
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8. The computer-implemented method of claim 1 whCTein the ^imulated 
microairay image simulates spot drift of cDNA deposition spots. 

9. The computer-implemented method of claim 1 wherein the simulated 
microarray image simulates center core variation of cDNA deposition spots. 

10. The computer-implemented method of claim 1 wherein the simulated 
microarray image simulates chord removal of cDNA deposition spots. 

1 1 . The computer-implemented method of claim 1 wherein the simulated 
microairay image simulates edge noise of cDNA deposition spots. 

12. The computer-implemented mettiod of claim 1 wherein the simulated 
microairay image simulates edge enhancement of cDNA deposition spots. 

13. The computer-implemented method of claim 1 vrherein the simulated 
microairay image simulates signal intensity. 

1 4. The computer-implemented method of claim 1 wherein the simulated 
microarray image simulates channel conditioning. 

1 5. The computer-implemented method of claim 1 wherein the simulated 
microarray image simulates spike noise. 

1 6. The computer-implemented method of claim 1 wherein the sunulated 
microairay image simulates scratch noise. 

17. The computer-implemented method of claim 1 wherem the simulated 
microarray image simulates snake noise. 
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18. The computer-implemented method of claim 1 wherein the simulated 
microairay image simulates smoothing. 

19. The computer-implemented method of claim 1 wherein the 
generating comprises randomization at a spot level of the simulated microairay 
image. 

20. The computer-implemented method of claim 1 wherein the 
generating comprises randomization at a block level of flie simulated microairay 
image. 

2 1 . The computer-implemented method of claim I wherein the 
generating comprises randomization at an array level of the simulated micioairay 
image. 

22. A software system for generating a simulated microairay image, the 
system comprising: 

simulation parameters; and 

a simulated microairay image generator operable to generate a simulated 
microairay image based at least on the simulation parameters. 

23. A software system for generating a simulated mia-oanay image, the 
system comprising: 

means for storing simulation parameters; and 

means for generating a simulated microarray image based at least on the 
simulation parameters. 

24. A method comprising: 

generating a simulated microairay image based on simulation parameters, 
wherein the simulated microairay image is associated with known values; and 

analyzing the simulated microarray image via a microarray imaging 
procedure, the analyzing comprising calculating observed values. 
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25 . A computer-readable medium having computer-executable 
instractions for performing the method of claim 24. 

26. The method of claim 24 furttier comprising: 

comparing the known values with the observed values to benchmark the 
raicroairay imaging procedure. 

27. The method of claim 28 further comprisii^: 

generating a rating based on results of the comparing, wherein the rating 
indicates effectiveness of the microarray inoaging procedure. 

28. The method of claim 24 wherein the values comprise spot intensity 

values. 

29. The method of claim 24 wherein the gmerating comprises simulating 
a fluorescent background level for the simulated microairay image. 

30. The method of claim 24 wherein the generating comprises simulating 
spots for the simulated microarray image. 

3 1 . The method of claim 24 wherein the graerating comprises simulating 
post-processing phenomena for the simulated microarray image. 

32. A method for simulating a microarray, comprising: 
defining a plurality of parameters; 

generating a microarray according to the parameters using an imaging 
procedure; 

comparing the microairay to a known value; and 

evaluating the imaging procedure in response to the comparison. 
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33. A computer-readable medium having computer-executable 
instructions for perfomung the method of claim 32. 
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Figure 1. Above 6gurc shows the steps invaU'ed in generating Che oiicroairay 
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FIG. 2 
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Figm* Z. Above figures shm^ vanous backgroimd iimse$« The mean SNR is set 8t LO 
for the abdve slides. The slides h&velbllowittg settings (a) fnrabolic ba^ {prcttndikoise 
Oi) Posittve slope l3adcgn»ind (e) K^odve slope backgmi&a all ^fh global noise 
paiamter. Hie backgt<md tfe^txm fictor is set Btk^ ^ Ir^ ^ 3 OK 
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FIG. 3 




Figure 3* Above cTC^xnpk shows dificrcct noise settings for spots innw hole. Whw <a) 

global background paramcto ti> fill the ceotcr hole, (b) uses local background for 
miiag the center hole, Ifiebadqeramd oolsels set to slc^^pe VAth SNRof 1^ 



STis R-'t; !~rT r-: 1-5 JUL 2004 ' 



THIS PAGE BLAWK^yspii. 



wo 03/()97850 



PCT/US03/01784 



4/18 



FIG. 4 
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Ho. 4 cDNAmlcican^'spalnnidel. 
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FIG. 5 



fa) m Ce) 

Figure & Above figure shows the varialnUt/ in spot size and i^ptead fiom bs size. The ^ot 
radius distribudoi is aiifamatiesny set d^oading oo theinimberof spots in a1diteic(wU^ 
bd^hQ. &i Ae above cguunpae lias (a) 00.1Si}» |lx~UE233 24.31. (b) C2Qi25), |ts ~tJri2.fi 

and (c) (25,45). ||«~D[5.4S 6^ vnth afandard deria&m 1%, 7%. 20% oT 
tadtnsfce^pectivt^ 
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FIG. 6 




ia5],(b)<3^«^iaxcls,n^MJI8 93 (c)G^^ 10 pixels. |ia-XJ[6^ 7^.Thc»bQv« 
mxnple hu ^5,20) rows, cdxsmDS aspecfyvOyvOk fc« 0.05 
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FIG. 7 




Mffm 7» Above figure shows Hht cflfect of radius drift wiadon (Pd, Rdt. Rdd* Above 
cxampics has fcllowiiig setdngs Oi) CD.OS, 5, 100). (b) (0J25, 1^ 100), (c) (0*5, 50, 100). 
As the «cdvatioQ probabOl^ with drift cange is set his^» the scpots drifts awfty fiom its 

CCttttit 
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FIG. 8 
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Hffan S. .Above figore shows di£&ieitf dusd late sAti^ge for cadi of did& The 
prabaMl^ weights &r (0,i;2^,4) dionis wMSCtnit fijtlowinslevds. (a) (9.7, OJ. OiO» 
0, 0) <b> (0^ OA 0^5, 0.15. 0) (e) (0.0, OJ. OA, OS, 02} ttnMcdvdy. Chotidttle is 
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figure 9. Figure shows the edge noise on the spots. Noise coGtrotting pammetftr ® can be 
»et &om 10, 1,0], Above example showsmt locreaseiledsc noise cfiSbct. 0.25. 
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FIG. 10 




\ 10 too itx». 9casaa meofi 't « tM xxj». viad^i? IW-W 

ilgore 10. Fluoresccat de^cficn lespcKise chsractcristic ^mctioixs. la ^ figures, 

middle (bli^)£:isv€i3 die Tclerenoc jfimcdoai va& pamnctcrs of<a<b ffit 

100, i). Alscs, in all figures, ^ex^axis is the input dgn^ hitedsity^ miyms is the 

chserved sigdai isiteaisity» hoQx are in ]ogir scde, X>dayBd response at various 

levels, ¥rith ftceci «6 0 and 1 . (b) Kfifexmt ampli&atioa levids^ wi^ fixed oq « 

OaxidcKi*''^!* (c)IMflh«!Uf«spaiS9CaFvatuK^mf^ 

c&er panunettf se(t!ii8S» with fixed 03 «^ 1. 



"Blue" curve is circled. 
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FIG. 1 1 
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Flsere 11« Fossibte scatter plot doe to vadou&i'ti^poiise oaavierBimB for dififeimt 
ihicresoent diannds* 10,000 Data poists {Bene cxpres^km levels) xwcre genenited lay the 
esq^onetttial distributica with mean of 3000. ABer pits$ing1brou|^ two fitioreseent 
channch (with same response chaiiactedsdc fbnctioasas eihowa in parts (a) to<c)X ^ta 
^isxiatiods wm«dded1)y|m&$ing cadi data pcKnttfaioug^ aisonoal diistribisdoD with fiic 
standard deviadoa to be 1 5% of mean ejqpressioa d^asl. (a) Wi&out my alteration (or 
equivalcntly, set paramcJers for the response fimcdcm to be (iXQ^ t^, «>) - (0, I, -1, i)), 
and assume the ^gnal mt«iwf?ties from red fhunn^ and green channel ate eqinvalait (a 
filiated sei^sdf ^aqwrimcnt). (b) Banana-sh^^ie; IntensiQ^ hi gcieen dianndl pass a 
rehouse fimc^on Twfiiparamctcrs (aa, au oj. cj) ^ (0, 500, 4» IX T?*ere red chame! 
takes the parameters (0, 1Q» *1 , l\ (c) Shnxsdd-shape. The red channeTs jrtspoose 
iuadion vrith paraxnetcrs (0, 100*^^ -0.7, 1), and the green eh^d with (0, lOO*^, -0^. 
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FIG. 12 



ip) 9>) (c) 

Figure 12> Above £gurc Aawa increased ^Hce noise levds X.^. (a) levd of 04 Si^, 
<|b) tevd of 5%, (c) levd of 10% , eatponcnlial rate range is tmiintHiaed for the abow 
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FIG. 13 




m 

Fl^ire 13. Hguie shove's scratch noisd vAfh its pamneicr settings, Number €i scraldics Is mamtaitied 
to7inihe«t>ciftK«xamplc&Follcra^ 71i««»l.5, W«i»3pbcels^ 

(b)I*-lll5 15]. K«»-2J5.WK-»7pixc!ls.(c)U>-l3lS 45], 1CW^*4.qpW»,^ IS pbee^Ibe noise 
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FIG. 14 
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(a) m w 

Figure 14. Above cxaasple i^ows di£ga»3t panundar se(aD£ for snake noose. In tius enmc^le (a) 
^•c=5,L^--Ul5 16l.Xi«^O.5,W|p-2pixds(b)N«,-10,I^-Ut5 30J.i£k-^0J3,W.p«3 
pixels, <c)N»,M5, Lip^U[5 «0], ic;»K>^, W,p-=5pjxcb«q^ 
nuidomly ciuisen eqfual prok^iqr to 
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FIG. 15 
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Fl^ 15 Example shows khs 3X3 cjonvolutbn kcfnel fcr ^flal bine- 
turn and Qs) pyranldal lundlon. 



FIG. 16 



(a): Noised 0>) RatFuncKon 




(c) Pyramid Function 




Fig. TC Example shows the 3D pruTilB befbre and afcr smocthlng. whsre <a> nnlscd, OOflal Rnidicn, tc:! fi^Tamld funHica 
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FIG. 17 
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Figure 17. Hiis example shovels full size arrays simuladon with ^if^cnt parazneter $cttiitg$. 
Depending oil the paramckurs the mny^ 9re called fis a\*eragc and no!$y in quali^ (a) good qoalUy 
has SNR or2.0« with normal background, S|>!ke noise .i;^p,- >i03^ noisyarraywithSNRof l»i 
Vti&i pafaMicWk^iind noise^ spike dqUc £^ » 15%. 
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FIG. 18 
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Wlpm 18. CosapansoQ between BimuUtcd signal 0dcd setting) vs. extracteel sigoat &om 
microaiT^ image analyst ptoffsm, (a) SgnU extrmcted fixnn one floorescmt duimicl (y-. 
axis) ccsmpariag to die ^ggymlused Ibr simuh^oa mlhe same liiasnel (x-axis). (b) Satios 
fitna micrcKUTBy image ana^sis program Cf^^^) cooipanng to At lalifss geimtecl bf (he 
sixnalatioa (x-asda). 
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FIG. 19 
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(a) l&a- (d) 

rigitrt l9:(»)]nrtofact««1h3Mdhtedifiia£ewiiiia^ 

|in«nun produce undeieeiedxpa^ 
cxtnctua pro£nin more aeim^ 
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